R Introduction Workshop
May 08, 2019
Load crime.csv using read.table() or read.csv() and assign it to variable crime.
What is the mean murder rate in the US according to crime data?
Load the first sheet of titanic.xlsx using the Import Dataset button in RStudio.
In total how many females perished in titanic?
There are 3 main families of visualization functions:
plot() – See ?plot?lattice::Lattice?ggplot2::ggplot2Basic plot syntax:
plot(x , y) x: vector for x axis, y: vector for y axis
See ?plot
par() to plot multiple plotsplot() vs ggplot()A picture is worth a thousand words – when the picture is good
ggplot()ggplotly()ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.# The easiest way to get ggplot2 is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just ggplot2:
install.packages("ggplot2")
# Don't forget to load tidyverse to your environment
library(tidyverse)
# Or just ggplot2
library(ggplot2)ggplot(),
aes().geom_ functions.scale_ or labs() and lims() functions.facet_ functionscoord_ functionsirisiris data## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
aes()geom_colorcolor + sizep <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length))color + size + alpha (transparency)p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width))color + size + alpha + shapep <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width, shape=Species))p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width, shape=Species)) +
guides( color=guide_legend(ncol = 3, byrow = TRUE),
size=guide_legend(ncol = 3, byrow = TRUE),
alpha=guide_legend(ncol = 3, byrow = TRUE))geom: point + smoothp <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width)) +
geom_smooth()What will this give me?
geom: point + smoothp <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width)) +
geom_smooth()Ooops! What happened??
geom: point + smoothp <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length, color=Species))
p + geom_point(aes(size=Petal.Length, alpha=Petal.Width)) +
geom_smooth()Why did this work now?
Can you see the difference?
geom: point + smoothp <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(size=Petal.Length, alpha=Petal.Width)) + geom_smooth(aes(color=Species))What about this? What’s happening here?
Let’s generate a hypothetical iris with some added ecosystem type and precipitation data.
ecosys <- sample(c("Forest", "Riparian", "Urban"), size = 150, replace = T)
precp <- sample(c("Heavy", "Mild"), size = 150, replace = T)
iris2 <- cbind(iris, Ecosystem=ecosys, Precipitation=precp)
head(iris2)## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Ecosystem
## 1 5.1 3.5 1.4 0.2 setosa Forest
## 2 4.9 3.0 1.4 0.2 setosa Urban
## 3 4.7 3.2 1.3 0.2 setosa Riparian
## 4 4.6 3.1 1.5 0.2 setosa Forest
## 5 5.0 3.6 1.4 0.2 setosa Riparian
## 6 5.4 3.9 1.7 0.4 setosa Urban
## Precipitation
## 1 Heavy
## 2 Mild
## 3 Heavy
## 4 Mild
## 5 Mild
## 6 Heavy
iris2Now, I would like to see how my previous graph changes for the different types of ecosystem and precipitation.
This was the graph :
- I am not using geom_smooth for now because I do not have enough data points for model prediction.
- Also I will remove the alpha aesthetic to make it easier for us to see.
p2 <- ggplot(data=iris2, aes(x=Sepal.Width, y=Sepal.Length, color=Species))
p2 <- p2 + geom_point(aes(size=Petal.Length)) # + geom_smooth()
p2You get the idea here right?
wagesYou can use facet_wrap if you want to facet by just 1 variable but you want to organize them nicely.
## # A tibble: 4 x 7
## earn height sex race ed age age_cat
## <dbl> <dbl> <chr> <chr> <dbl> <dbl> <fct>
## 1 79571. 73.9 male white 16 49 (43.9,51.2]
## 2 96397. 66.2 female white 16 62 (58.5,65.8]
## 3 48711. 63.8 female white 16 33 (29.3,36.6]
## 4 80478. 63.2 female other 16 95 (87.7,95.1]
Or you can specify the rows and columns for the faceting
Plot the wages.csv data like the following